77 research outputs found

    Maximum Margin Multiclass Nearest Neighbors

    Full text link
    We develop a general framework for margin-based multicategory classification in metric spaces. The basic work-horse is a margin-regularized version of the nearest-neighbor classifier. We prove generalization bounds that match the state of the art in sample size nn and significantly improve the dependence on the number of classes kk. Our point of departure is a nearly Bayes-optimal finite-sample risk bound independent of kk. Although kk-free, this bound is unregularized and non-adaptive, which motivates our main result: Rademacher and scale-sensitive margin bounds with a logarithmic dependence on kk. As the best previous risk estimates in this setting were of order k\sqrt k, our bound is exponentially sharper. From the algorithmic standpoint, in doubling metric spaces our classifier may be trained on nn examples in O(n2logn)O(n^2\log n) time and evaluated on new points in O(logn)O(\log n) time

    The Missing Mass Problem

    Full text link
    We give tight lower and upper bounds on the expected missing mass for distributions over finite and countably infinite spaces. An essential characterization of the extremal distributions is given. We also provide an extension to totally bounded metric spaces that may be of independent interest.Comment: 15 page
    corecore